Search CORE

605 research outputs found

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence

Author: Fan Jianqing
Gu Weijie
Han Xu
Publication venue
Publication date: 15/11/2011
Field of study

Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any SNPs are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In the current paper, we propose a novel method based on principal factor approximation, which successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive an approximate expression for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provide a consistent estimate of realized FDP. This result has important applications in controlling FDR and FDP. Our estimate of realized FDP compares favorably with Efron (2007)'s approach, as demonstrated in the simulated examples. Our approach is further illustrated by some real data applications. We also propose a dependence-adjusted procedure, which is more powerful than the fixed threshold procedure.Comment: 51 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1012.439

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

PubMed Central

Effects of Arbuscular Mycorrhizal Fungi on Accumulation of Heavy Metals in Rhizosphere Soil

Author: HE Jiayong
XU Weijie
Publication venue: 'Scholink Co, Ltd.'
Publication date: 08/05/2018
Field of study

The rhizosphere soil arbuscular mycorrhizal fungi will affect the absorption of heavy metal substances by the host plants. The effects of the arbuscular mycorrhizal fungi are inhibitory and conversion effects. The type and quantity of AMF fungi are different, and there are also differences in the absorption of arbuscular mycorrhizal fungi in the rhizosphere soil. Changes in the accumulation of heavy metals will affect the growth of arbuscular mycorrhizal fungi in the rhizosphere soil. In this paper, a preliminary investigation is made as to whether the AMF fungus number will affect the absorption of heavy metal Cd. Experiments show that with the increase of soil spores, the available cadmium content of soil also tends to increase

Scholink Journals

PROCESS OF EXTRACTING HIGH QUALITY PROTEINS FROM CEREAL GRANS AND THER BYPRODUCTS USING ACDIC MEDIUMAND A REDUCINGAGENT

Author: Reddy Narenda
Xu Weijie
Yang Yiqi
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 12/03/2009
Field of study

The present invention is directed to a method for processing a plant-based protein source, the method comprising an acidic extracting solution comprising a reducing agent is useful for extracting and isolating proteins from plant-based protein SOUCS

Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation

Author: Sengamedu Srinivasan H.
Wan Yixin
Wu Fanyou
Xu Weijie
Publication venue
Publication date: 28/10/2023
Field of study

Model hallucination has been a crucial interest of research in Natural Language Generation (NLG). In this work, we propose sequence-level certainty as a common theme over hallucination in NLG, and explore the correlation between sequence-level certainty and the level of hallucination in model responses. We categorize sequence-level certainty into two aspects: probabilistic certainty and semantic certainty, and reveal through experiments on Knowledge-Grounded Dialogue Generation (KGDG) task that both a higher level of probabilistic certainty and a higher level of semantic certainty in model responses are significantly correlated with a lower level of hallucination. What's more, we provide theoretical proof and analysis to show that semantic certainty is a good estimator of probabilistic certainty, and therefore has the potential as an alternative to probability-based certainty estimation in black-box scenarios. Based on the observation on the relationship between certainty and hallucination, we further propose Certainty-based Response Ranking (CRR), a decoding-time method for mitigating hallucination in NLG. Based on our categorization of sequence-level certainty, we propose 2 types of CRR approach: Probabilistic CRR (P-CRR) and Semantic CRR (S-CRR). P-CRR ranks individually sampled model responses using their arithmetic mean log-probability of the entire sequence. S-CRR approaches certainty estimation from meaning-space, and ranks a number of model response candidates based on their semantic certainty level, which is estimated by the entailment-based Agreement Score (AS). Through extensive experiments across 3 KGDG datasets, 3 decoding methods, and on 4 different models, we validate the effectiveness of our 2 proposed CRR methods to reduce model hallucination

arXiv.org e-Print Archive

S2vNTM: Semi-supervised vMF Neural Topic Modeling

Author: Desai Jay
Iannacci Francis
Jiang Xiaoyu
Sengamedu Srinivasan
Xu Weijie
Publication venue
Publication date: 06/07/2023
Field of study

Language model based methods are powerful techniques for text classification. However, the models have several shortcomings. (1) It is difficult to integrate human knowledge such as keywords. (2) It needs a lot of resources to train the models. (3) It relied on large text data to pretrain. In this paper, we propose Semi-Supervised vMF Neural Topic Modeling (S2vNTM) to overcome these difficulties. S2vNTM takes a few seed keywords as input for topics. S2vNTM leverages the pattern of keywords to identify potential topics, as well as optimize the quality of topics' keywords sets. Across a variety of datasets, S2vNTM outperforms existing semi-supervised topic modeling methods in classification accuracy with limited keywords provided. S2vNTM is at least twice as fast as baselines.Comment: 17 pages, 9 figures, ICLR Workshop 2023. arXiv admin note: text overlap with arXiv:2307.0122

arXiv.org e-Print Archive

1-(3,5-Dimethoxybenzyl)-1H-pyrrole

Author: Huang Wei
Huo Shiyong
Li Yueqing
Zhang Xu
Zhao Weijie
Publication venue: International Union of Crystallography
Publication date: 01/05/2012
Field of study

The title compound, C13H15NO2, was synthesized from 3,5-dimethoxybenzaldehyde. The dihedral angle between the pyrrole and benzene rings is 89.91 (5)°. In the crystal, weak C—H⋯O and C—H⋯π interactions link the molecules into a three-dimensional network

Directory of Open Access Journals

PubMed Central

Lifelong Sequential Modeling with Personalized Memorization for User Response Prediction

Author: Bian Weijie
Fang Yuchen
Gai Kun
Qin Jiarui
Ren Kan
Xu Jian
Yu Yong
Zhang Weinan
Zheng Lei
Zhou Guorui
Zhu Xiaoqiang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/05/2019
Field of study

User response prediction, which models the user preference w.r.t. the presented items, plays a key role in online services. With two-decade rapid development, nowadays the cumulated user behavior sequences on mature Internet service platforms have become extremely long since the user's first registration. Each user not only has intrinsic tastes, but also keeps changing her personal interests during lifetime. Hence, it is challenging to handle such lifelong sequential modeling for each individual user. Existing methodologies for sequential modeling are only capable of dealing with relatively recent user behaviors, which leaves huge space for modeling long-term especially lifelong sequential patterns to facilitate user modeling. Moreover, one user's behavior may be accounted for various previous behaviors within her whole online activity history, i.e., long-term dependency with multi-scale sequential patterns. In order to tackle these challenges, in this paper, we propose a Hierarchical Periodic Memory Network for lifelong sequential modeling with personalized memorization of sequential patterns for each user. The model also adopts a hierarchical and periodical updating mechanism to capture multi-scale sequential patterns of user interests while supporting the evolving user behavior logs. The experimental results over three large-scale real-world datasets have demonstrated the advantages of our proposed model with significant improvement in user response prediction performance against the state-of-the-arts.Comment: SIGIR 2019. Reproducible codes and datasets: https://github.com/alimamarankgroup/HPM

arXiv.org e-Print Archive

Crossref